Search CORE

42 research outputs found

A Comparison of Four Approaches to Discretization Based on Entropy †

Author: Grzymala-Busse Jerzy W.
Publication venue: 'MDPI AG'
Publication date: 09/11/2017
Field of study

We compare four discretization methods, all based on entropy: the original C4.5 approach to discretization, two globalized methods, known as equal interval width and equal frequency per interval, and a relatively new method for discretization called multiple scanning using the C4.5 decision tree generation system. The main objective of our research is to compare the quality of these four methods using two criteria: an error rate evaluated by ten-fold cross-validation and the size of the decision tree generated by C4.5. Our results show that multiple scanning is the best discretization method in terms of the error rate and that decision trees generated from datasets discretized by multiple scanning are simpler than decision trees generated directly by C4.5 or generated from datasets discretized by both globalized discretization methods

KU ScholarWorks

The usefulness of a machine learning approach to knowledge acquisition

Author: Grzymala-Busse Dobroslawa M.
Grzymala-Busse Jerzy W.
Publication venue: 'Wiley'
Publication date: 18/05/2005
Field of study

This paper presents results of experiments showing how machine learning methods are useful for rule induction in the process of knowledge acquisition for expert systems. Four machine learning methods were used: ID3, ID3 with dropping conditions, and two options of the system LERS (Learning from Examples based on Rough Sets): LEM1 and LEM2. Two knowledge acquisition options of LERS were used as well. All six methods were used for rule induction from six real-life data sets. The main objective was to test how an expert system, supplied with these rule sets, performs without information on a few attributes. Thus an expert system attempts to classify examples with all missing values of some attributes. As a result of experiments, it is clear that all machine learning methods performed much worse than knowledge acquisition options of LERS. Thus, machine learning methods used for knowledge acquisition should be replaced by other methods of rule induction that will generate complete sets of rules. Knowledge acquisition options of LERS are examples of such appropriate ways of inducing rules for building knowledge bases

KU ScholarWorks

Global discretization of continuous attributes as preprocessing for machine learning

Author: Chmielewski Michal R.
Grzymala-Busse Jerzy W.
Publication venue: Published by Elsevier Inc.
Publication date: 01/11/1996
Field of study

AbstractReal-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

Elsevier - Publisher Connector

KU ScholarWorks

Partition triples: A tool for reduction of data sets

Author: Grzymala-Busse Jerzy W.
Than Soe
Publication venue: 'Elsevier BV'
Publication date: 18/05/2005
Field of study

Data sets discussed in this paper are presented as tables with rows corresponding to examples (entities, objects) and columns to attributes. A partition triple is defined for such a table as a triple of partitions on the set of examples, the set of attributes, and the set of attribute values, respectively, preserving the structure of a table. The idea of a partition triple is an extension of the idea of a partition pair, introduced by J. Hartmanis and J. Steams in automata theory. Results characterizing partition triples and algorithms for computing partition triples are presented. The theory is illustrated by an example of an application in machine learning from examples. (C) 1996 Academic Press, Inc

KU ScholarWorks

Partition triples: A tool for reduction of data sets

Author: Grzymala-Busse Jerzy W.
Than Soe
Publication venue: 'Elsevier BV'
Publication date: 01/12/1996
Field of study

Elsevier - Publisher Connector

KU ScholarWorks

Reduced Data Sets and Entropy-Based Discretization

Author: Grzymala-Busse Jerzy W.
Hippe Zdzislaw S.
Mroczek Teresa
Publication venue: 'MDPI AG'
Publication date: 28/10/2019
Field of study

This work is licensed under a Creative Commons Attribution 4.0 International License.Results of experiments on numerical data sets discretized using two methods—global versions of Equal Frequency per Interval and Equal Interval Width-are presented. Globalization of both methods is based on entropy. For discretized data sets left and right reducts were computed. For each discretized data set and two data sets, based, respectively, on left and right reducts, we applied ten-fold cross validation using the C4.5 decision tree generation system. Our main objective was to compare the quality of all three types of data sets in terms of an error rate. Additionally, we compared complexity of generated decision trees. We show that reduction of data sets may only increase the error rate and that the decision trees generated from reduced decision sets are not simpler than the decision trees generated from non-reduced data sets

Multidisciplinary Digital Publishing Institute

KU ScholarWorks

Improving prediction of preterm birth using a new classification scheme and rule induction

Author: Grzymala-Busse Jerzy W.
Woolery Linda K.
Publication venue: HANLEY & BELFUS INC
Publication date: 18/05/2005
Field of study

Prediction of preterm birth is a poorly understood domain. The existing manual methods of assessment of preterm birth are 17% - 38% accurate. The machine learning system LERS was used for three different datasets about pregnant women. Rules induced by LERS were used in conjunction with a classification scheme of LERS, based on ''bucket brigade algorithm'' of genetic algorithms and enhanced by partial matching. The resulting prediction of preterm birth in new, unseen cases is much more accurate (68%-90%)

KU ScholarWorks

Global discretization of continuous attributes as preprocessing for machine learning

Author: Chmielewski Michal R.
Grzymala-Busse Jerzy W.
Publication venue: 'Elsevier BV'
Publication date: 16/05/2005
Field of study

Real-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. in this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis is presented and compared experimentally with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-one-out methods for ten real-life data sets

KU ScholarWorks

Entropy of English text: Experiments with humans and a machine learning system based on rough sets

Author: Grzymala-Busse Jerzy W.
Moradi Hamid
Roberts James A.
Publication venue: 'Elsevier BV'
Publication date: 16/05/2005
Field of study

The goal of this paper is to show the dependency of the entropy of English text on the subject of the experiment, the type of English text, and the methodology used to estimate the entropy. Claude Shannon first described the technique for estimating the entropy of English text by a human subject guessing the next letter after viewing a string of characters taken from actual text. We show how this result is affected by using different humans in the experiment (Shannon used only his wife) and by using different types of text material (Shannon used only a single book). We also show how the results are affected when we replace the human subjects with a machine learning system based on rough sets. Automating the play of the guessing game with this system, called LERS, gives rise to a lossless data compression scheme. (C) Elsevier Science Inc. 1998

KU ScholarWorks

Operation-preserving functions and autonomous factors of finite automata

Author: Grzymala-Busse Jerzy W.
Publication venue: Published by Elsevier Inc.
Publication date: 31/10/1971
Field of study

The relationship between the structure of autonomous finite automata and their operation-preserving functions is considered. The results imply some ideas in the study of operation-preserving functions of arbitrary finite automata, because with each finite automaton the set of its autonomous factors is associated. Basing on the method of the investigation of operation-preserving functions of finite automaton A and by studying autonomous factors of A, the algorithm for determining operation-preserving functions of A is given

Elsevier - Publisher Connector